AI正在经历范式转变,随着模型的兴起(例如Bert,Dall-E,GPT-3),这些模型经过大规模的数据训练,并且可以适应广泛的下游任务。我们称这些模型基础模型来强调其至关重要但不完整的特征。该报告提供了基础模型的机会和风险的详尽说明,包括其功能(例如语言,愿景,机器人技术,推理,人类互动)和技术原则(例如,模型架构,培训程序,数据,系统,安全,安全性,评估,理论)对其应用(例如法律,医疗保健,教育)和社会影响(例如不平等,滥用,经济和环境影响,法律和道德考虑)。尽管基础模型基于标准的深度学习和转移学习,但它们的规模导致了新的新兴能力,以及它们在许多任务中的有效性都激发了同质化。同质化提供了强大的杠杆作用,但要求谨慎,因为基础模型的缺陷均由下游的所有适应模型继承。尽管即将广泛地部署基础模型,但我们目前对它们的工作方式,失败以及由于其新兴属性的影响而缺乏清晰的了解。为了解决这些问题,我们认为基础模型的许多批判性研究都需要与他们的基本社会技术性质相称。
translated by 谷歌翻译
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in
translated by 谷歌翻译
Recent advances in upper limb prostheses have led to significant improvements in the number of movements provided by the robotic limb. However, the method for controlling multiple degrees of freedom via user-generated signals remains challenging. To address this issue, various machine learning controllers have been developed to better predict movement intent. As these controllers become more intelligent and take on more autonomy in the system, the traditional approach of representing the human-machine interface as a human controlling a tool becomes limiting. One possible approach to improve the understanding of these interfaces is to model them as collaborative, multi-agent systems through the lens of joint action. The field of joint action has been commonly applied to two human partners who are trying to work jointly together to achieve a task, such as singing or moving a table together, by effecting coordinated change in their shared environment. In this work, we compare different prosthesis controllers (proportional electromyography with sequential switching, pattern recognition, and adaptive switching) in terms of how they present the hallmarks of joint action. The results of the comparison lead to a new perspective for understanding how existing myoelectric systems relate to each other, along with recommendations for how to improve these systems by increasing the collaborative communication between each partner.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Generating a chain of thought (CoT) can increase large language model (LLM) performance on a wide range of tasks. Zero-shot CoT evaluations, however, have been conducted primarily on logical tasks (e.g. arithmetic, commonsense QA). In this paper, we perform a controlled evaluation of zero-shot CoT across two sensitive domains: harmful questions and stereotype benchmarks. We find that using zero-shot CoT reasoning in a prompt can significantly increase a model's likelihood to produce undesirable output. Without future advances in alignment or explicit mitigation instructions, zero-shot CoT should be avoided on tasks where models can make inferences about marginalized groups or harmful topics.
translated by 谷歌翻译
We consider the problem of finding an accurate representation of neuron shapes, extracting sub-cellular features, and classifying neurons based on neuron shapes. In neuroscience research, the skeleton representation is often used as a compact and abstract representation of neuron shapes. However, existing methods are limited to getting and analyzing "curve" skeletons which can only be applied for tubular shapes. This paper presents a 3D neuron morphology analysis method for more general and complex neuron shapes. First, we introduce the concept of skeleton mesh to represent general neuron shapes and propose a novel method for computing mesh representations from 3D surface point clouds. A skeleton graph is then obtained from skeleton mesh and is used to extract sub-cellular features. Finally, an unsupervised learning method is used to embed the skeleton graph for neuron classification. Extensive experiment results are provided and demonstrate the robustness of our method to analyze neuron morphology.
translated by 谷歌翻译
Prior work has identified a resilient phenomenon that threatens the performance of human-AI decision-making teams: overreliance, when people agree with an AI, even when it is incorrect. Surprisingly, overreliance does not reduce when the AI produces explanations for its predictions, compared to only providing predictions. Some have argued that overreliance results from cognitive biases or uncalibrated trust, attributing overreliance to an inevitability of human cognition. By contrast, our paper argues that people strategically choose whether or not to engage with an AI explanation, demonstrating empirically that there are scenarios where AI explanations reduce overreliance. To achieve this, we formalize this strategic choice in a cost-benefit framework, where the costs and benefits of engaging with the task are weighed against the costs and benefits of relying on the AI. We manipulate the costs and benefits in a maze task, where participants collaborate with a simulated AI to find the exit of a maze. Through 5 studies (N = 731), we find that costs such as task difficulty (Study 1), explanation difficulty (Study 2, 3), and benefits such as monetary compensation (Study 4) affect overreliance. Finally, Study 5 adapts the Cognitive Effort Discounting paradigm to quantify the utility of different explanations, providing further support for our framework. Our results suggest that some of the null effects found in literature could be due in part to the explanation not sufficiently reducing the costs of verifying the AI's prediction.
translated by 谷歌翻译
SchNetPack is a versatile neural networks toolbox that addresses both the requirements of method development and application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks as well as a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a flexible command-line interface. This makes SchNetPack 2.0 easily extendable with custom code and ready for complex training task such as generation of 3d molecular structures.
translated by 谷歌翻译
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译